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Abstract 

The  study  of  transportability  aims  to  identify  conditions  un¬ 
der  which  causal  information  learned  from  experiments  can 
be  reused  in  a  different  environment  where  only  passive  ob¬ 
servations  can  be  collected.  The  theory  introduced  in  [Pearl 
and  Bareinboim,  2011]  (henceforth  [PB,  2011])  defines  for¬ 
mal  conditions  for  such  transfer  but  falls  short  of  providing 
an  effective  procedure  for  deciding  whether  transportability 
is  feasible  for  a  given  set  of  assumptions  about  differences 
between  the  source  and  target  domains.  This  paper  provides 
such  procedure.  It  establishes  a  necessary  and  sufficient  con¬ 
dition  for  deciding  when  causal  effects  in  the  target  domain 
are  estimable  from  both  the  statistical  information  available 
and  the  causal  information  transferred  front  the  experiments. 
The  paper  further  provides  a  complete  algorithm  for  comput¬ 
ing  the  transport  formula,  that  is,  a  way  of  fusing  experimen¬ 
tal  and  observational  information  to  synthesize  an  estimate  of 
the  desired  causal  relation. 

Motivation 

The  problem  of  transporting  knowledge  from  one  environ¬ 
ment  to  another  has  been  pervasive  in  many  data-driven 
sciences.  Invariably,  when  experiments  are  performed  on  a 
group  of  subjects,  the  issue  arises  whether  the  conclusions 
are  applicable  to  a  different  but  somehow  related  group. 
When  a  robot  is  trained  in  a  simulated  environment,  the 
question  arises  whether  it  could  put  the  acquired  knowledge 
into  use  in  a  new  environment  where  relationships  among 
agents,  objects  and  features  are  different. 

Surprisingly,  the  conditions  under  which  this  extrapola¬ 
tion  can  be  legitimized  were  not  formally  articulated.  Al¬ 
though  the  problem  has  been  discussed  in  many  areas  of 
statistics,  economics,  and  the  health  sciences,  under  rubrics 
such  as  “external  validity”  [Campbell  and  Stanley,  1963; 
Manski,  2007],  “meta-analysis”  [Glass,  1976;  Hedges  and 
Olkin,  1985;  Owen,  2009],  “heterogeneity”  [Hofler,  Gloster, 
and  Hoyer,  2010],  “quasi-experiments”  [Shadish,  Cook,  and 
Campbell,  2002,  Ch.  3;  Adelman,  1991],  these  discussions 
are  limited  to  verbal  narratives  in  the  form  of  heuristic  guide¬ 
lines  for  experimental  researchers  -  no  formal  treatment  of 
the  problem  has  been  attempted. 
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AI  is  in  a  unique  position  to  tackle  this  problem  formally. 
First,  the  distinction  between  statistical  and  causal  knowl¬ 
edge  has  received  syntactic  representation  through  causal  di¬ 
agrams  [Pearl,  1995;  Spirtes,  Glymour,  and  Schemes,  2001; 
Pearl,  2009;  Koller  and  Friedman,  2009].  Second,  graphical 
models  provide  a  language  for  representing  differences  and 
commonalities  among  domains,  environments,  and  popula¬ 
tions  [PB,  2011].  Finally,  the  inferential  machinery  provided 
by  the  do-calculus  [Pearl,  1995;  2009;  Koller  and  Friedman, 
2009]  is  particularly  suitable  for  combining  these  two  fea¬ 
tures  into  a  coherent  framework  and  developing  effective  al¬ 
gorithms  for  knowledge  transfer. 

Following  [PB,  2011],  we  consider  transferring  causal 
knowledge  between  two  environments  n  and  n*.  In  en¬ 
vironment  n,  experiments  can  be  performed  and  causal 
knowledge  gathered.  In  n*,  potentially  different  from  n, 
only  passive  observations  can  be  collected  but  no  experi¬ 
ments  conducted.  The  problem  is  to  infer  a  causal  relation¬ 
ship  R  in  n*  using  knowledge  obtained  in  n.  Clearly,  if 
nothing  is  known  about  the  relationship  between  n  and  n*, 
the  problem  is  unsolvable.  Yet  the  fact  that  all  experiments 
are  conducted  with  the  intent  of  being  used  elsewhere  (e.g., 
outside  the  laboratory)  implies  that  scientific  progress  relies 
on  the  assumption  that  certain  environments  share  common 
characteristics  and  that,  owed  to  these  commonalities,  causal 
claims  would  be  valid  even  where  experiments  were  never 
performed. 

To  formally  articulate  commonalities  and  differences  be¬ 
tween  environments,  a  graphical  representation  named  se¬ 
lection  diagrams  was  devised  in  [PB,  2011],  which  repre¬ 
sent  differences  in  the  form  of  unobserved  factors  capable 
of  causing  such  differences.  Given  an  arbitrary  selection  di¬ 
agram,  our  challenge  is  to  algorithmically  decide  whether 
commonalities  override  differences  to  permit  the  transfer  of 
information  across  the  two  environments. 

Previous  Work  and  Our  Contributions 

Consider  Fig.  1(a)  which  concerns  the  transfer  of  experi¬ 
mental  results  between  two  locations.  We  first  conduct  a  ran¬ 
domized  trial  in  Los  Angeles  (LA)  and  estimate  the  causal 
effect  of  treatment  X  on  outcome  Y  for  every  age  group 
Z  =  z,  denoted  P(y\do(x),  z).  We  now  wish  to  generalize 
the  results  to  the  population  of  New  York  City  (NYC),  but 
we  find  the  distribution  P(a y,  z)  in  LA  to  be  different  from 
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the  one  in  NYC  (call  the  latter  P*  (x,  y,  z)).  In  particular,  the 
average  age  in  NYC  is  significantly  higher  than  that  in  LA. 
How  are  we  to  estimate  the  causal  effect  of  X  on  V  in  NYC, 
denoted  R  =  P*  (y\do(x))1  1,2 

The  selection  diagram  for  this  example  (Fig.  1(a))  con¬ 
veys  the  assumption  that  the  only  difference  between  the  two 
population  are  factors  determining  age  distributions,  shown 
as  S  — >  Z,  while  age-specific  effects  P(y\do(x),  Z  =  z)  are 
invariant  across  cities.  Difference-generating  factors  are  rep¬ 
resented  by  a  special  set  of  variables  called  selection  vari¬ 
ables  S  (or  simply  S'-variables),  which  are  graphically  de¬ 
picted  as  square  nodes  (■).  From  this  assumption,  the  over¬ 
all  causal  effect  in  NYC  can  be  derived  as  follows  3 

R  =  5>*(y|do(z),2)P*(z) 

z 

=  Y P(y\do{x),z)P*{z ) 

z 

The  last  line  is  the  transport  formula  for  R.  It  combines 
experimental  results  obtained  in  LA,  P(y\do(x),  z),  with 
observational  aspects  of  NYC  population,  P*(z),  to  obtain 
an  experimental  claim  P*(y\do(x))  about  NYC. 

In  this  trivial  example  the  transport  formula  amounts  to 
a  simple  re-calibration  of  the  age-specific  effects  to  account 
for  the  new  age  distribution.  In  more  elaborate  examples, 
however,  the  full  power  of  formal  analysis  would  be  re¬ 
quired.  For  instance,  [PB,  2011]  showed  that,  in  the  prob¬ 
lem  depicted  in  Fig.  1(b),  where  both  the  Z-determining 
mechanism  and  the  (7-determining  mechanism  are  suspect 
of  being  different,  the  transport  formula  for  the  relation 
R  =  P*(y\do(x))  is  given  by 

Y  P{y\do{x),z)  Y  P*{z\w)  Y  P(w\do{x),t)P*{t) 

z  w  t 

This  formula  instructs  us  to  estimate  P(y\do(x),  z)  and 
P(w\do(x),t)  in  the  experimental  domain,  then  combine 
them  with  the  estimates  of  P*(z\w)  and  P*(t)  in  the  target 
domain. 

[PB,  2011]  derived  this  formula  using  the  following  theo¬ 
rem,  which  translates  the  property  of  transportability  to  the 
existence  of  a  syntactic  reduction  using  a  sequence  of  do- 
calculus  operations. 

*We  will  use  Px(y)  interchangeably  with  P(y  \  do(x)). 

2We  use  the  structural  interpretation  of  causal  diagrams.  For 
example.  Fig.  1(a)  describes  the  following  system  of  structural 
equations:  2  <—  fi(s;uz-,uzx),  x  <—  h{z-,ux\uzx),  y  <— 
f3(y;  z\  uy\ uxy)\  each  variable  in  the  l.h.s.  is  assigned  a  value 
given  by  the  respective  deterministic  function  on  the  r.h.s.  The  ex¬ 
ogenous  (hidden)  variables  U  are  assigned  a  probability  function 
which  induces  in  turn,  a  probability  distribution  on  all  variables 
in  the  model.  See  Appendix  1  for  a  gentle  introduction  to  the  do- 
calculus  and  more  details  on  this  representation. 

3This  result  can  be  derived  by  purely  graphical  operations 
if  we  write  P*  (y\do(x),  z)  as  P(y\do(x),  z,  s),  thus  attributing 
the  difference  between  II  and  II*  to  a  fictitious  event  S  =  s. 
The  invariance  of  the  age-specific  effect  then  follows  from  the 
conditional  independence  ( S  _LL  Y\Z,  X)c  which  implies 
P(y\do(x),  z,  s)  =  P(y\do(x),  z),  and  licenses  the  derivation  of 
the  transport  formula. 


(a)  (b)  (c) 


Figure  1:  Selection  diagrams  illustrating  different  facets 
of  the  transportability  problem,  (a)  A  selection  diagram  in 
which  transportability  is  trivial,  (b)  A  selection  diagram  in 
which  the  causal  relation  R  is  more  involved  and  shown  to 
be  transportable  using  Theorem  1.  (c)  A  selection  diagram 
in  which  the  procedure  given  in  [PB,  201 1  ]  is  unable  to  rec¬ 
ognize  a  transportable  relation  R. 

Theorem  1  (Do-calculus  characterization  [PB,  201  lj).  Let 
D  be  the  selection  diagram  characterizing  n  and  n*,  and 
S  a  set  of  selection  variables  in  D.  The  relation  R  = 
P*(y|do(x),z)  is  transportable  from  n  to  n*  if  and  only  if 
the  expression  P(y|do(x),  z,  s)  is  reducible,  using  the  rules 
of  do-calculus,  to  an  expression  in  which  S  appears  only  as 
a  conditioning  variable  in  do-free  terms. 

Theorem  1  is  declarative  but  not  computationally  effec¬ 
tive,  for  it  does  not  specify  the  sequence  of  rules  leading  to 
the  needed  reduction,  nor  does  it  tell  us  if  such  a  sequence 
exists. 

To  overcome  this  deficiency,  [PB,  2011]  proposed  a  recur¬ 
sive  procedure  (their  Theorem  3)  which  can  handle  many  ex¬ 
amples,  among  them  Fig.  1(b),  but  is  not  complete.  We  will 
show4,  for  example,  that  their  procedure  fails  to  recognize 
R  as  “transportable”  in  the  diagram  of  Fig.  1(c)  whereas  the 
procedures  developed  in  this  paper  will  recognize  it  as  such 
and  will  support  it  with  the  transport  formula: 

R  =  Y  p(z\do{x ))  Y  P{w\do{x ,  z))  Y  P*{v\w)P*(y\v,  w) 

Z  W  V 

We  summarize  our  contributions  as  follows: 

•  We  derive  a  general  graphical  condition  for  deciding 
transportability  of  causal  effects.  We  show  that  transporta¬ 
bility  is  feasible  if  and  only  if  a  certain  graph  structure 
does  not  appear  as  an  edge  subgraph  of  the  inputted  se¬ 
lection  diagram. 

•  We  provide  necessary  or  sufficient  graphical  conditions 
for  special  cases  of  transportability,  for  instance,  con¬ 
trolled  direct  effects. 

•  Finally,  we  construct  a  complete  algorithm  for  deciding 
transportability  of  joint  causal  effects  and  returning  the 
correct  transport  formula  whenever  those  effects  are  trans¬ 
portable. 

Preliminary  Results 

The  basic  semantical  framework  in  our  analysis  rests  on 
probabilistic  causal  models  as  defined  in  [Pearl,  2000,  pp. 
205],  also  called  structural  causal  models  or  data-generating 
models.  In  the  structural  causal  framework  [Pearl,  2000,  Ch. 

4  See  Corollary  4  in  Appendix  2  . 


7],  actions  are  modifications  of  functional  relationships,  and 
each  action  do(x)  on  a  causal  model  M  produces  a  new 
model  Mx  =  (U,  V,  Fx,  P(U)),  where  Fx  is  obtained  af¬ 
ter  replacing  fx  £  F  for  every  X  £  X  with  a  new  function 
that  outputs  a  constant  value  x  given  by  do(x). 

Key  to  the  analysis  of  transportability  is  the  notion  of 
“identifi ability,”  defined  below,  which  expresses  the  require¬ 
ment  that  causal  effects  be  computable  from  a  combination 
of  data  P  and  assumptions  embodied  in  a  causal  graph  G. 
Definition  1  (Causal  Effects  Identifi  ability  [Pearl,  2000,  pp. 
77]).  The  causal  effect  of  an  action  do(x)  on  a  set  of  vari¬ 
ables  Y  such  that  Y  D  X  =  0  is  said  to  be  identifiable  from 
P  in  G  if  Px{ y)  is  uniquely  computable  from  P(V)  in  any 
model  that  induces  G. 

Causal  models  and  their  induced  graphs  are  normally  as¬ 
sociated  with  one  particular  domain  (also  called  setting, 
study,  population,  environment).  In  the  transportability  case, 
we  extend  this  representation  to  capture  properties  of  sev¬ 
eral  domains  simultaneously.  This  is  made  possible  if  we 
assume  that  there  are  no  structural  changes  between  the  do¬ 
mains,  that  is,  all  structural  equations  share  the  same  set  of 
arguments,  though  the  functional  forms  of  the  equations  may 
vary  arbitrarily.  5’  6 

Definition  2  (Selection  Diagram).  Let  ( M,M *)  be  a  pair 
of  structural  causal  models  [Pearl,  2000,  pp.  205]  relative 
to  domains  (II,  II*),  sharing  a  causal  diagram  G.  (M,  M *) 
is  said  to  induce  a  selection  diagram  D  if  D  is  constructed 
as  follows: 

1.  Every  edge  in  G  is  also  an  edge  in  D; 

2.  D  contains  an  extra  edge  Si  — >  V)  whenever  there  might 
exist  a  discrepancy  /;  f*  or  P{Uf)  P*(Uf)  between 
M  and  M*. 

In  words,  the  ^-variables  locate  the  mechanisms  where 
structural  discrepancies  between  the  two  domains  are  sus¬ 
pected  to  take  place.7  Alternatively,  one  can  see  a  selec¬ 
tion  diagram  as  a  carrier  of  invariance  claims  between  the 
mechanisms  of  both  domains  -  the  absence  of  a  selection 
node  pointing  to  a  variable  represents  the  assumption  that 
the  mechanism  responsible  for  assigning  value  to  that  vari¬ 
able  is  the  same  in  the  two  domains. 

Armed  with  a  selection  diagram  and  the  concept  of  iden- 
tifiability,  transportability  of  causal  effects  (or  transportabil¬ 
ity,  for  short)  can  be  defined  as  follows: 

Definition  3  (Causal  Effects  Transportability).  Let  D  be  a 
selection  diagram  relative  to  domains  (II,  II*).  Let  (P,  I) 
be  the  pair  of  observational  and  interventional  distributions 

5 This  definition  was  left  implicit  in  [PB.  2011]. 

6The  assumption  that  there  are  no  structural  changes  between 
domains  can  be  relaxed  starting  with  D  —  G*  and  adding  S-nodes 
following  the  same  procedure  as  in  Def.  2,  while  enforcing  acyclic¬ 
ity. 

7  Transportability  assumes  that  enough  structural  knowledge 
about  both  domains  is  known  in  order  to  substantiate  the  pro¬ 
duction  of  their  respective  causal  diagrams.  In  the  absence  of 
such  knowledge,  causal  discovery  algorithms  can  be  used  to  in¬ 
fer  the  diagrams  from  data  [Pearl  and  Verma,  1991;  Pearl,  2000; 
Spirtes,  Glymour,  and  Scheines,  2001], 


of  II,  and  P*  be  the  observational  distribution  of  II*.  The 
causal  effect  R  =  Px( y)  is  said  to  be  transportable  from  II 
to  II*  in  D  ;/Px(y)  is  uniquely  computable  from  P,P*,I  in 
any  model  that  induces  D. 

The  problem  of  transportability  generalizes  the  problem 
of  identifiability,  to  witness  note  that  all  identifiable  causal 
relations  in  (G*,P*)  are  also  transportable,  because  they 
can  be  computed  directly  from  II*  and  require  no  experi¬ 
mental  information  from  II.  This  observation  engender  the 
following  definition  of  trivial  transportability. 

Definition  4.  (Trivial  Transportability) 

A  causal  relation  R  is  said  to  be  trivially  transportable  from 
II  to  II*,  if  R(  II*)  is  identifiable  from  (G* ,  P*). 

The  following  observation  establishes  another  connec¬ 
tion  between  identifiability  and  transportability.  For  a  given 
causal  diagram  G,  one  can  produce  a  selection  diagram  D 
such  that  identifiability  in  G  is  equivalent  to  transportability 
in  D.  First  set  D  =  G,  and  then  add  selection  nodes  pointing 
to  all  variables  in  D,  which  represents  that  the  target  domain 
does  not  share  any  commonality  with  its  pair  -  this  is  equiv¬ 
alent  to  the  problem  of  identifiability  because  the  only  way 
to  achieve  transportability  is  to  identify  R  from  scratch  in 
the  target  domain. 

Another  special  case  of  transportability  occurs  when  a 
causal  relation  has  identical  form  in  both  domains  -  no  re¬ 
calibration  is  needed.  This  is  captured  by  the  following  def¬ 
inition. 

Definition  5.  (Direct  Transportability) 

A  causal  relation  R  is  said  to  be  directly  transportable  from 
UtoU*,ifR(U*)  =  R(U). 

A  graphical  test  for  direct  transportability  of  R  = 
P(y\do(x),  z )  follows  from  do-calculus  and  reads:  (S  _LL 
Y\X,  Z)c x  i  in  words,  X  blocks  all  paths  from  S  to  Y  once 
we  remove  all  arrows  pointing  to  X  and  condition  on  Z.  As 
a  concrete  example,  the  ^-specific  effects  in  Fig.  1(a)  is  the 
same  in  both  domains,  hence,  it  is  directly  transportable. 

These  two  cases  will  act  as  a  basis  to  decompose  the  prob¬ 
lem  of  transportability  into  smaller  and  more  manageable 
subproblems  (to  be  shown  later  on). 

The  following  lemma  provides  an  auxiliary  tool  to  prove 
non-transportability  and  is  based  on  refuting  the  uniqueness 
property  required  by  Definition  3. 

Lemma  1.  Let  X,  Y  be  two  sets  of  disjoint  variables,  in 
population  II  and  II*,  and  let  D  be  the  selection  diagram. 
Px(y)  is  not  transportable  from  II  to  II*  if  there  exist  two 
causal  models  M 1  and  M 2  compatible  with  D  such  that 
Pi(V)  =  P2(V),  Pf  (V)  =  P2*(V),  P1(V\W\do(W))  = 
P‘2  ( V\ W  |  do(  W) ),  for  any  set  W,  all  families  have  positive 
distribution,  and  P*(y|do(x))  P2  (y|do(x)). 

Proof.  Let  I  be  the  set  of  interventional  distributions  P(V  \ 
W|do(W)),  for  any  set  W.  The  latter  inequality  rules  out 
the  existence  of  a  function  from  P,  P* ,  /  to  Px( y).  □ 

While  the  problems  of  identifiability  and  transportabil¬ 
ity  are  related.  Lemma  1  indicates  that  proofs  of  non¬ 
transportability  are  more  involved  than  those  of  non- 
identifiability.  Indeed,  to  prove  non-transportability  requires 


the  construction  of  two  models  agreeing  on  (P,  I,  P*),  while 
non-identifiability  requires  the  two  models  to  agree  solely  on 
the  observational  distribution  P. 

The  simplest  non-transportable  structure  is  an  extension 
of  the  famous  ‘bow  arc’  graph  named  here  ‘s-bow  arc’,  see 
Fig.  2(a).  The  s-bow  arc  has  two  endogenous  nodes:  X,  and 
its  child  Y,  sharing  a  hidden  exogenous  parent  U,  and  a  5- 
node  pointing  to  Y.  This  and  similar  structures  that  prevent 
transportability  will  be  useful  in  our  proof  of  completeness, 
which  requires  a  demonstration  that  whenever  the  algorithm 
fails  to  transport  a  causal  relation,  the  relation  is  indeed  non¬ 
transportable. 

Theorem  2.  Pf(y)  is  not  transportable  in  the  s-bow  arc 
graph. 

Proof.  The  proof  will  show  a  counter-example  to  the  trans¬ 
portability  of  P*(Y)  through  two  models  Mi  and  M2  that 
agree  in  (P,  P* ,  I)  and  disagree  in  P*  (y). 

Assume  that  all  variables  are  binary.  Let  the  model  Mi 
be  defined  by  the  following  system  of  structural  equations: 
Xi  =  U,Yi  =  ((X®U)®S),Pi(U)  =  1/2,  and  M2  by  the 
following  one:  X2  =  U,  Y2  =  5  V  (X  ®  If),  P2(U)  =  1/2, 
where  ®  represents  the  exclusive  or  function. 

Lemma  2.  The  two  models  agree  in  the  distributions 
(P,P*,I). 

Proof.  We  show  that  the  following  equations  must  hold  for 

Mi  and  M2: 

Pi(X\S)  =  P2(X\S),  5  =  {0, 1} 
Pi(Y\X,S)=P2(Y\X,S),  5  ={0,1} 
Pi(Y\do(X),S  =  0)  =  P2(Y\do(X),S  =  0) 

for  all  values  of  X,  Y.  The  equality  between  p/A/S)  is  ob¬ 
vious  since  ( 5  _LL  X)  and  X  has  the  same  structural  form  in 
both  models.  Second,  let  us  construct  the  truth  table  for  Y : 
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To  show  that  the  equality  between  P,  (Y  =  1|X,  5  = 
0),  X  =  {0, 1}  holds,  we  rewrite  it  as  follows: 

Pi(Y  =  l\X,S  =  0)  = 

Pi(Y  =  1\X,  S  =  0,U=  l)Pi(X\U  =  1  )Pi(U  =  1) 

Pi  PO 

Pt  {Y  =  1\X,  5  =  0,  C  =  0)Pi(X\U  =  0 )P(U  =  0) 

Pi(X)  } 

In  eq.  (1),  the  expressions  for  X  =  {0,1}  are  functions  of 
the  tuples  {(X  =  1,  5  =  0,  U  =  1),  (A '  =  0,  5  =  0,  U  = 
0)},  which  evaluate  to  the  same  value  in  both  models.  Sim¬ 
ilarly,  the  expressions  P,(Y  =  1|X,  5  =  1)  for  X  =  {0, 1} 
are  functions  of  the  tuples  {(X  =  1,  5  =  1,  U  =1),  (X  = 
0,  5  =  1,  U  =  0)},  which  also  evaluate  to  the  same  value  in 
both  models. 


(a)  (b) 


Figure  2:  (a)  Smallest  selection  diagram  in  which 
P(y\do(x))  is  not  transportable  (s-bow  graph),  (b)  A  selec¬ 
tion  diagram  in  which  even  though  there  is  no  5-node  point¬ 
ing  to  Y,  the  effect  of  X  on  Y  is  still  not-transportable  due 
to  the  presence  of  a  sC-tree  (see  Corollary  2). 

We  further  assert  the  equality  between  the  interventional 
distributions  in  II,  which  can  be  written  using  the  do- 
calculus  as 

Pi(Y  =  l\do(X),S  =  0)  = 

^  Pi(Y\do{X),  S  =  0,  U)Pi(U\do(X),  5  =  0)  = 

u 

P(Y  =  1\X,  S  =  0,  U  =  1  )Pi(U  =  1)  + 

Pt{Y  =  1\X,  S  =  0,  U  =  0 )P.l(U  =  0),  X  =  {0, 1}  (2) 

Evaluating  this  expression  points  to  the  tuples  {( X  = 
1,5  =  0,C7  =  1),(X  =  1,5  =  0,C7  =  0)  and  (X  = 
0, 5  =  0,  U  =  1),  (X  =  0,  5  =  0,  U  =  0)},  which  map  to 
the  same  value  in  both  models.  □ 

Lemma  3.  There  exist  values  of  X,  Y  such  that 
Pi(Y\do(X),S=  1)  ^  P2(Y\do(X),S  =  1). 

Proof.  Fix  X  =  1,  Y  =  1,  and  let  us  rewrite  the  desired 
quantity  P;  =  PfY  =  1|  do(X  =  1),  5  =  1)  in  II*  as 

Pi  =  P(Y  =  1\X  =  1, 5  =  1,  U  =  1  )Pi(U  =  1)+ 
Pi(Y  =  1\X  =  1, 5  =1,U  =  0)Pi(U  =  0)  (3) 

Since  Ri  is  a  function  of  the  tuples  (X  =  1,5  =  1,  U  = 
1),  (X  =  1,  5  =  1,  U  =  0),  it  evaluates  in  Mi  to  {1, 1}  and 
in  M2  to  {1, 0}. 

Hence,  together  with  the  uniformity  of  P(U),  it  follows 
that  Pi  =  1  and  R2  =  1/2,  which  finishes  the  proof.  □ 

Lemma  1,  Lemmas  2  and  3  prove  Theorem  2.  □ 

Characterizing  Transportable  Relations 

The  concept  of  confounded  components  (or  C-components) 
was  introduced  in  [Tian  and  Pearl,  2002]  to  represent  clus¬ 
ters  of  variables  connected  through  bidirected  edges,  and 
was  instrumental  in  establishing  a  number  of  conditions  for 
ordinary  identification  (Def.  1).  If  G  is  not  a  C-component 
itself,  it  can  be  uniquely  partitioned  into  a  set  C(G)  of  G- 
components.  We  now  recast  C-components  in  the  context  of 
transportability.8 

Definition  6  (sC-component).  Let  G  be  a  selection  diagram 
such  that  a  subset  of  its  bidirected  arcs  forms  a  spanning  tree 
over  all  vertices  in  G.  Then  G  is  a  sC-component  ( selection 
confounded  component). 

8  C-components  can  itself  be  seen  as  an  extension  of  the  more 
elementary  notion  of  inducing  path,  which  was  introduced  much 
earlier  in  [Verma  and  Pearl,  1990], 


A  special  subset  of  G'-components  that  embraces  the  an¬ 
cestral  set  of  Y  was  noted  by  [Shpitser  and  Pearl,  2006b]  to 
play  an  important  role  in  deciding  identifiability  -  this  obser¬ 
vation  can  also  be  applied  to  transportability,  as  formulated 
in  the  next  definition. 

Definition  7  (sC-tree).  Let  G  be  a  selection  diagram  such 
that  C{G)  =  {G},  all  obsen’able  nodes  have  at  most  one 
child,  there  is  a  node  Y,  which  is  a  descendent  of  all  nodes, 
and  there  is  a  selection  node  pointing  to  Y.  Then  G  is  called 
a  Y -rooted  sC-tree  (selection  confounded  tree). 

The  presence  of  this  structure  (and  generalizations)  will 
prove  to  be  an  obstacle  to  transportability  of  causal  effects. 
For  instance,  the  s-bow  arc  in  Fig.  2(a)  is  a  Y -rooted  sG-tree 
where  we  know  Px(y)  is  non-transportable. 

In  certain  classes  of  problems,  the  absence  of  such  struc¬ 
tures  will  prove  sufficient  for  transportability.  One  such  class 
is  explored  below,  and  consists  of  models  in  which  the  set  X 
coincides  with  the  parents  of  Y. 

Theorem  3.  Let  G  be  a  selection  diagram.  Then  for  any 
node  Y,  the  causal  effects  Ppa(Y)(v)  transportable  if 
there  is  no  subgraph  of  G  which  forms  a  Y -rooted  sC-tree. 
Proof.  See  Appendix  2.  □ 

Theorem  3  provides  a  tractable  transportability  condition 
for  the  Controlled  Direct  Effect  (CDE)  -  a  key  concept  in 
modern  mediation  analysis,  which  permits  the  decompo¬ 
sition  of  effects  into  their  direct  and  indirect  components 
[Pearl,  2001;  2012],  CDE  is  defined  as  the  effect  of  X  on 
Y  when  all  other  parents  of  Y  are  held  constant,  and  it  is 
identifiable  if  and  only  if  Ppa(Y){y)  is  identifiable  [Pearl, 
2009,  pp.  128]. 

The  selection  diagram  in  Fig.  1  (a)  does  not  contain  any  Y - 
rooted  sG-trees  as  subgraphs,  and  therefore  the  direct  effects 
(causal  effects  of  Y’s  parents  on  Y)  is  indeed  transportable. 
In  fact,  the  transportability  of  CDE  can  be  determined  by  a 
more  visible  criterion: 

Corollary  1.  Let  G  be  a  selection  diagram.  Then  for  any 
node  Y,  the  direct  effect  Ppa(Y)(u)  (iS  transportable  if  there 
is  no  S  node  pointing  to  Y. 

Proof.  See  Appendix  2.  □ 

Generalizing  to  arbitrary  effects,  the  following  result  pro¬ 
vides  a  necessary  condition  for  transportability  whenever  the 
whole  graph  is  a  sG-tree. 

Theorem  4.  Let  G  be  a  Y -rooted  sC-tree.  Then  the  effects 
of  any  set  of  nodes  in  G  on  Y  are  not  transportable. 

Proof.  See  Appendix  2.  □ 

The  next  corollary  demonstrates  that  sG-trees  are  obsta¬ 
cles  to  the  transportability  of  Px(y )  even  when  they  do  not 
involve  Y,  i.e.,  transportability  is  not  a  local  problem  -  if 
there  exists  a  node  W  that  is  an  ancestor  of  Y  but  not  nec¬ 
essarily  “near”  it,  transportability  is  still  prohibited  (see  Fig. 
2(b)).  This  fact  anticipates  that  transporting  causal  effects  of 
singleton  Y  is  not  necessarily  easier  than  the  general  prob¬ 
lem  of  transportability. 


Figure  3;  Example  of  a  selection  diagram  in  which 
P(Y\do(X))  is  not  transportable,  there  is  no  sG-tree  but 
there  is  a  s*-tree. 

Corollary  2.  Let  G  be  a  selection  diagram,  and  X  and  Y  a 
set  of  variables.  If  there  exists  a  node  W  that  is  an  ancestor 
of  some  node  Y  £  Y  such  that  there  exists  a  W -rooted  sC- 
tree  which  contains  any  variables  in  X,  then  Px( y)  is  not 
transportable. 

Proof.  See  Appendix  2.  □ 

We  now  generalize  the  definition  of  sG-trees  (and  The¬ 
orem  4)  in  two  ways:  first,  Y  is  augmented  and  can  be  a 
set  of  variables;  second,  5-nodes  can  point  to  any  variable 
within  the  sG-component,  not  necessarily  to  root  nodes.  For 
instance,  consider  the  graph  G  in  Fig.  3.  Note  that  there  is  no 
Y-rooted  sG-tree  nor  IF -rooted  sG-tree  in  G  (where  W  is 
an  ancestor  of  Y),  and  so  the  previous  results  cannot  be  ap¬ 
plied  even  though  the  effect  of  X  on  Y  is  not  transportable 
in  G  -  still,  there  exists  a  Y-rooted  s*-tree  in  G,  which  will 
prevent  the  transportability  of  the  causal  effect. 

Definition  8  (s*-tree).  Let  G  be  a  selection  diagram,  where 
Y  is  the  maximal  root  set.  Then  G  is  aY -rooted  s*-tree  if  G 
is  a  sC -component,  all  obsen’able  nodes  have  at  most  one 
child,  and  there  is  a  selection  node  pointing  to  some  vertex 
ofG  (not  necessarily  in  Y ). 

We  next  conveniently  introduce  a  structure  that  wit¬ 
nesses  non-transportability  characterized  by  a  pair  of  s*- 
trees.  Transportability  will  be  shown  impossible  whenever 
such  structure  exists  as  an  edge  subgraph  of  the  given  selec¬ 
tion  diagram. 

Definition  9  (s-hedge).  Let  X,  Y  be  set  of  variables  in  G. 
Let  F,  F'  be  R -rooted  s* -trees  such  that  F  Cl  X  0,  F'  D 
X  =  0,  F'  C  F,  R  C  An(Y)(j_.  Then  F  and  F'  form  a 
s-hedge  for  PX(Y)  in  G. 

For  instance,  in  Fig.  3,  the  s*-trees  F'  =  {G,  Y},  and 
F  =  F'  U  { X ,  A,  B}  form  a  s-hedge  to  Px(y). 

We  state  below  the  formal  connection  between  s-edges 
and  non-transportability. 

Theorem  5.  Assume  there  exist  F,  F'  that  form  a  s-hedge 
for  Px(y)  in  AandA*.  Then  Px(y)  is  not  transportable  from 

n  to  it. 

Proof.  See  Appendix  2.  □ 

To  prove  that  the  s-hedges  characterize  non¬ 
transportability  in  selection  diagrams,  we  construct  in 
the  next  section  an  algorithm  which  transport  any  causal 
effects  that  do  not  contain  a  s-hedge. 


A  Complete  Algorithm  For  Transportability  of 
Joint  Effects 

The  algorithm  proposed  to  solve  transportability  is  called 
sID  (see  Fig.  4)  and  extends  previous  analysis  and  algo¬ 
rithms  of  identifiability  given  in  [Pearl,  1995;  Kuroki  and 
Miyakawa,  1999;  Tian  and  Pearl,  2002;  Shpitser  and  Pearl, 
2006b;  Huang  and  Valtorta,  2006].  We  build  on  two  obser¬ 
vations  developed  along  the  paper: 

(i)  Transportability:  Causal  relations  can  can  be  parti¬ 
tioned  into  trivially  and  directly  transportable. 

(ii)  Non-transportability:  The  existence  of  a  s-hedge  as  an 
edge  subgraph  of  the  inputted  selection  diagram  can  be 
used  to  prove  non-transportability. 

The  algorithm  sID  first  applies  the  typical  c-component  de¬ 
composition  on  top  of  the  inputted  selection  diagram  D,  par¬ 
titioning  the  original  problem  into  smaller  blocks  (call  these 
blocks  sc-factors)  until  either  the  entire  expression  is  trans¬ 
portable,  or  it  runs  into  the  problematic  s-hedge  structure. 

More  specifically,  for  each  sc-factor  Q,  sID  tries  to  di¬ 
rectly  transport  Q.  If  it  fails,  sID  tries  to  trivially  transport 
Q ,  which  is  equivalent  to  solving  an  ordinary  identification 
problem.  sID  alternates  between  these  two  types  of  trans¬ 
portability,  and  whenever  it  exhausts  the  possibility  of  apply¬ 
ing  these  operations,  it  exits  with  failure  with  a  counterex¬ 
ample  for  transportability  -  that  is,  the  graph  local  to  the 
faulty  call  witnesses  the  non-transportability  of  the  causal 
query  since  it  contains  a  s-hedge  as  edge  subgraph. 

Before  showing  the  more  formal  properties  of  sID,  we 
demonstrate  how  sID  works  through  the  transportability  of 
Q  =  P(y\do(x))  in  the  graph  in  Fig.  1(c). 

Since  I?  =  An(Y)  and  C(D  \  {X})  =  (C0,  Cu  C2), 
where  Co  =  D({Z}),  C\  =  D({W}),  and  C-i  = 
D({V,  Y}),  we  invoke  line  4  and  try  to  transport  respec¬ 
tively  Q0  =  P*!WtVtV{z),  Qi  =  P£>z>Vty{w),  and  Q2  = 
P*  z  w{v,y).  Thus  the  original  problem  reduces  to  transport¬ 
ing  J2z,w,vPx,w,v,y(Z)Px,z,v,y(W)Px,z,w(V^y)- 

Evaluating  the  first  expression,  we  trigger  line  2,  noting 
that  nodes  that  are  not  ancestors  of  Z  can  be  ignored.  This 
implies  that  P£wvy{z)  =  Px(z )  with  induced  subgraph 
Go  =  {X  —>  Z,  X  <—  Uxz  — >  Z}.  where  Uxz  stands  for 
the  hidden  variable  between  X  and  Z.  sID  goes  to  line  5,  in 
which  in  the  local  call  C(D\ {A})  =  {Go}.  Note  that  in  the 
ordinary  identifiability  problem  the  procedure  would  fail  at 
this  point,  but  sID  proceeds  to  line  6  testing  whether  (S  _L 
_L  Z\X)  i)  x .  The  test  comes  true,  which  makes  sID  directly 
transport  Q0  with  data  from  the  experimental  domain  n,  i.e., 

px(z)  =  pxiz)- 

Evaluating  the  second  expression,  we  again  trigger  line 
2,  which  implies  that  P*  _  „  „(w)  =  P*(w)  with  induced 
subgraph  Gi  =  {X  — >  z’z  —>  W,X  l-  Uxz  — >  Z}.  sID 
goes  to  line  5,  in  which  in  the  local  call  C(D  \  {X})  = 
{Gi}.  Thus  it  proceeds  to  line  6  testing  whether  (S  _LL 
W|X,  Z)  nx  .  The  test  comes  true  again,  which  makes  sID 
directly  transport  Q  i  with  data  from  the  experimental  do¬ 
main  n,  i.e.,  P*z{w)  =  Px,z{w). 

Evaluating  the  third  expression,  sID  goes  to  line  5  in 
which  C(D  \  {X,  Z,W})  =  {G2},  where  G2  =  {V  —> 


function  SlD(y ,  x,  P* ,  /,  D) 

INPUT:  x,  y  value  assignments,  P*  observational  distribu¬ 
tion  in  n*,  I  set  of  interventional  distributions  in  II,  /2  a 
selection  diagram,  S  set  of  selection  nodes. 

OUTPUT:  Expression  for  P*( y)  in  terms  of  P*,I  or 
FAIL(F,F'). 

1  if  x  =  0,  return  J2v\y  P*(V) 

2  ifV\  An(Y)D  ±  0, 

return  slD(y,  x  n  An(Y)D,  Ev\An(Y)D  P *>  An(Y)o) 

3  SetW  =  (V\X)\An(Y)D_. 

if  W  ^  0,  return  SlD(y ,  x  U  w,  P* ,  D) 

4  if  C(D\X)  =  {G0,G1,...,Gfc}, 

return  Ev\{y  x}  IX  SlDfo,  V  \  a,  P*,D) 

5  if  C(D  \  X)  =  {G0| 

6  if  (S  _LL  Y  |  X)d_,  return  P(y|do(x)) 

7  if  C(D)  =  {£>},  FAIL(£>,  G0) 

8  if  G0  G  C(D),  return  Ilqv^s  p*(vi\VD  _1)) 

9  if  (3G')G0  cG'e  C(D),  return  SlD(y,  x  n  C', 
n i|yl6C'  P^Vilvt^  nG',^-1’  \G'),G0. 

Figure  4:  Modified  version  of  identification  algorithm  capa¬ 
ble  of  recognizing  transportable  relations. 

Y,  S  — >  V,  V  <—  Uvy  — >  Y}.  It  proceeds  to  line  6  testing 
whether  (S  _LL  W\X,  Z)  nx  z,  which  is  false  in  this  case. 

It  tests  the  other  conditions  until  it  reaches  line  9,  in  which 
C  =  GoUG2U{X  <—  Uxy  — >  Y}.  Thus  it  tries  to  transport 
Q 2  =  P*  z(v,y)  over  the  induced  graph  C' ,  which  stands 
for  ordinary  identification,  and  trivially  yields  (after  simpli¬ 
fication)  P* (v\w) P* (y\v ,  w) .  The  return  of  these  calls 

composed  indeed  coincide  with  the  expression  provided  in 
the  first  section. 

We  prove  next  soundness  and  completeness  of  sID. 

Theorem  6  (soundness).  Whenever  sID  returns  an  expres¬ 
sion  for  Px(y),  it  is  correct. 

Proof.  See  Appendix  2.  □ 

Theorem  7.  Assume  sID  fails  to  transport  Px(  y)  (executes 
line  7).  Then  there  exists  X'  C  X,  Y'  C  Y,  such  that  the 
graph  pair  D ,  Go  returned  by  the  fail  condition  ofsID  con¬ 
tain  as  edge  subgraphs  s* -trees  F,  F'  that  form  a  s-hedge 

for  PA V). 

Proof.  See  Appendix  2.  □ 

Corollary  3  (completeness).  sID  is  complete. 

Proof.  See  Appendix  2.  □ 

Conclusions 

We  provide  a  complete  (necessary  and  sufficient)  graphical 
condition  for  deciding  when  the  causal  effect  of  one  set  of 
variables  on  another  can  be  transported  from  experimental  to 
non-experimental  environment.  We  further  provide  a  com¬ 
plete  algorithm  for  computing  the  correct  transport  formula 
whenever  this  graphical  condition  holds. 


Appendix  1 

The  do-calculus  [Pearl,  1995]  consists  of  three  rules  that  per¬ 
mit  us  to  transform  expressions  involving  do-operators  into 
other  expressions  of  this  type,  whenever  certain  conditions 
hold  in  the  causal  diagram  G.  (See  footnote  1  for  semantics.) 

We  consider  a  DAG  G  in  which  each  child-parent  fam¬ 
ily  represents  a  deterministic  function  Xi  =  ffpcti,  ef),  i  = 
1 ,n,  where  pai  are  the  parents  of  variables  Xl  in  G; 
and  ej,  i  =  1, . . . ,  n  are  arbitrarily  distributed  random  dis¬ 
turbances,  representing  background  factors  that  the  investi¬ 
gator  chooses  not  to  include  in  the  analysis. 

Let  X,  Y,  and  Z  be  arbitrary  disjoint  sets  of  nodes 
in  a  causal  DAG  G.  An  expression  of  the  type  E  = 
P(y\do(x),  z)  is  said  to  be  compatible  with  G  if  the  inter¬ 
ventional  distribution  described  by  E  can  be  generated  by 
parameterizing  the  graph  with  a  set  of  functions  /,;  and  a  set 
of  distributions  of  e»,  i  =  1 , ,n 

We  denote  by  G the  graph  obtained  by  deleting  from 
G  all  arrows  pointing  to  nodes  in  X.  Likewise,  we  denote 
by  Gx  the  graph  obtained  by  deleting  from  G  all  arrows 
emerging  from  nodes  in  X.  To  represent  the  deletion  of  both 
incoming  and  outgoing  arrows,  we  use  the  notation  G  v  z . 

The  following  three  rules  are  valid  for  every  interven¬ 
tional  distribution  compatible  with  G. 

Rule  1  (Insertion/deletion  of  observations): 

P(y\do(x),z,w)  =  P(y\do(x),w) 

if  (Y  X  Z\X,  W)Gx 

Rule  2  (Action/observation  exchange): 

P(y\do(x),do(z),w)  =  P(y\do(x),  z,w) 

if  (YALZ\X,  W)Gj^ 

Rule  3  (Insertion/deletion  of  actions): 

P(y\do(x),do(z),w)  =  P(y\do(x),w) 

if(y*z|x,ir)OlI0B, 

where  Z(W)  is  the  set  of  Z-nodes  that  are  not  ancestors  of 
any  IL-node  in  G  v . 

The  do-calculus  was  proven  to  be  complete  [Shpitser  and 
Pearl,  2006a;  Huang  and  Valtorta,  2006],  in  the  sense  that  if 
an  equality  cannot  be  established  by  repeated  application  of 
these  three  rules,  it  is  not  valid. 

Appendix  2 

Lemma  4.  Let  X,  Y  be  sets  of  variables.  Let  M,  M*  be  a 
pair  of  causal  models  and  G  the  respective  selection  dia¬ 
gram.  Then  Q  =  P*(Y)  is  transportable  in  G  if  and  only  if 
Q  is  transportable  in  Gj\n{y). 

Proof.  See  [Tian,  2002,  Chapter  5]  that  provides  analogous 
construction.  □ 

Theorem  3.  Let  G  be  a  selection  diagram.  Then  for  any 
node  Y,  the  direct  effect  Ppa(Y)(u)  *s  transportable  if  there 
is  no  subgraph  ofG  which  forms  a  Y -rooted  sC-tree. 


Proof.  We  known  from  [Tian,  2002,  Theorem  22]  that 
whenever  there  exists  no  subgraph  Gt  of  G  satisfying  all  of 
the  following:  (i)  Y  £  T;  (ii)  Gp  has  only  one  c-component, 
T  itself;  (iii)  All  variables  in  T  are  ancestors  of  Y  in  Gt,  the 
direct  effect  on  Y  is  identifiable,  as  sC-trees  are  structures 
of  this  type.  Further  [Shpitser  and  Pearl,  2006b,  Theorem  2] 
showed  that  the  same  holds  for  C-trees,  which  also  implies 
the  inexistence  of  a  sG-trees.  Since  such  structure  does  not 
show  up  in  G,  the  target  quantity  is  identifiable,  and  hence 
transportable. 

It  remains  to  show  that  the  same  holds  whenever  there  ex¬ 
ists  a  subgraph  that  is  a  G-tree  and  in  which  no  S  node  points 
to  Y,  i.e.,  there  is  no  Y-rooted  sG-tree  at  all.  It  is  true  that 
(S  _LL  Y\Pci(Y))gp  given  that  all  paths  from  S  to  Y 
are  closed.  This  follows  from  the  following  facts:  1)  all  paths 
from  S  passing  through  Y’s  ancestors  were  cut  in  GPa^Yy, 
2)  all  bidirected  paths  were  also  closed  given  that  the  condi¬ 
tioning  set  contains  only  root  nodes,  and  a  connection  from 
S  must  pass  through  at  least  one  collider;  3)  by  Lemma  4, 
transportability  does  not  depend  on  descendants  of  Y.  Thus, 
it  follows  that  we  can  write  Ppapy\(Y)  =  Ppa(Y){Y\S)  = 
Ppa(Y)  (Y),  concluding  the  proof.  □ 

Corollary  1.  Let  G  be  a  selection  diagram.  Then  for  any 
node  Y,  the  direct  effect  Ppapp){y)  is  transportable  if  there 
is  no  S  node  pointing  to  Y. 

Proof.  Follows  directly  from  Theorem  3.  □ 

Lemma  5.  The  exclusive  OR  (XOR)  function  is  commutative 
and  associative. 

Proof.  Follows  directly  from  the  definition  of  the  XOR 
function.  □ 

Remark  1.  Despite  the  fact  of  being  a  strict  generalization 
of  Theorem  2,  the  construction  provided  below  is  still  worth 
to  make  for  two  reasons:  first,  it  will  provide  a  simplified 
construction  of  the  one  given  in  Theorem  2;  second,  it  will 
set  the  tone  for  proofs  of  generic  graph  structures  which 
will  in  the  sequel  show  to  be  instrumental  in  proving  non¬ 
transportability  in  arbitrary  structures. 

Theorem  4.  Let  G  be  a  Y-rooted  sC-tree.  Then  the  effects 
of  any  set  of  nodes  in  G  on  Y  are  not  transportable. 

Proof.  The  proof  will  proceed  by  constructing  a  family  of 
counterexamples.  For  any  such  G  and  any  set  X,  we  will 
construct  two  causal  models  Mi  and  M2  that  will  agree 
on  (P,  P* ,  I),  but  disagree  on  the  interventional  distribution 

Px(y)- 

Let  the  two  models  Mi,  M2  agree  on  the  following  fea¬ 
tures.  All  variables  in  U  U  V  are  binary.  All  exogenous  vari¬ 
ables  are  distributed  uniformly.  All  endogenous  variables 
except  Y  are  set  to  the  bit  parity  (xor)  of  the  values  of  their 
parents.  The  two  models  differ  is  respect  to  Y’s  definition. 
Consider  the  function  for  Y,  fy  :  U ,  Pa(Y)  — >  Y  to  be 
defined  as  follows: 

(  Mi  :  Y  =  (( pa(Y )  <g>  u)  ®  s) 

\  M2  :  Y  =  ( (pa(Y )  <g>  u)  V  s) 


Lemma  6.  The  two  models  agree  in  the  distributions 

(P,P*,I). 

Proof  Since  the  two  models  agree  on  P(U)  and  all  func¬ 
tions  except  fy,  it  suffices  to  show  that  /y  maintains  the 
same  input/output  behavior  in  both  models  for  each  do¬ 
mains. 

Subclaim  1:  Let  us  show  that  both  models  agree  in  the  ob¬ 
servational  and  interventional  distributions  relative  to  do¬ 
main  II,  i.e.,  the  pair  (P,  I).  The  index  variable  S  is  set  to  0 
in  II,  and  /y  evaluates  to  (pa(Y)<g>u)  in  both  models,  which 
proves  the  subclaim. 

Subclaim  2:  Let  us  show  that  both  models  agree  in  the  ob¬ 
servational  distribution  relative  to  II*,  i.e.,  P*.  The  index 
variable  S  is  set  1  in  II*,  and  fy  evaluates  to  ((pa{Y)  ® 
u)  ®  1)  in  Mi,  and  1  in  M2.  Since  the  evaluation  in  M\ 
can  be  rewritten  as  ->((pa(Y)  (g)  u ),  it  remains  to  show  that 
( pa(Y )  (g)  u )  always  evaluates  to  0. 

This  fact  is  certainly  true,  consider  the  following  obser¬ 
vations:  a)  each  variable  in  U  has  exactly  two  endogenous 
children;  b)  the  given  tree  has  V  as  the  root;  c)  all  functions 
are  XOR  -  these  imply  that  Y  is  computing  the  bit  parity  of 
the  sum  of  all  U  nodes,  which  turns  out  to  be  even,  and  so 
evaluates  to  0  and  proves  the  subclaim.  □ 

Lemma  7.  For  any  set  X,  Pi(Y |do(X),  S  =  1)  f 
Pa(Y\do(X),S=l). 

Proof.  Given  the  functional  description  and  the  discussion 
in  the  previous  Lemma,  the  function  fy  evaluates  always  to 
1  in  M2. 

Now  let  us  consider  Mi .  Note  that  performing  the  inter¬ 
vention  and  cutting  the  edges  going  toward  X  creates  an 
asymmetry  on  the  sum  of  the  bidirected  edges  departing 
from  U,  and  consequently  in  the  sum  performed  by  Y .  It 
will  be  the  case  that  some  U'  will  appear  only  once  in  the 
expression  of  Y.  Therefore,  depending  on  the  assignment 
X  =  x,  we  will  need  to  evaluate  the  sum  (mod  2)  over  U'  in 
Y  or  its  negation,  which  given  the  uniformity  of  the  distri¬ 
bution  of  U  will  yield  Pi(Y|do(X),  S  =  1)  =  1/2  in  both 
cases.  □ 

By  Lemma  1,  Lemmas  6  and  7  together  prove  Theorem 
4.  □ 


(a)  (b) 

Figure  5:  Selection  diagrams  in  which  P(y\do(x))  is  not 
transportable,  there  is  no  sC-tree  but  there  is  a  s*-tree.  These 
diagrams  will  be  used  as  basis  for  the  general  case;  the  first 
diagram  is  named  sp-graph  and  the  second  one  sfe-graph. 

-  i.e.,  Y  or  some  of  its  ancestors  were  roots  of  a  given  sC- 
tree.  In  the  problem  of  identifiability,  the  counterpart  of  sC- 
tree  (i.e.,  C'-tree)  suffices  to  characterize  non-identifiability 
for  singleton  Y.  But  transportability  is  more  subtle  and  this 
is  not  the  case  here  -  it  depends  not  only  on  X  and  Y  “loca¬ 
tions”  in  the  graph,  but  also  the  relative  position  of  S.  Con¬ 
sider  Figures  3  and  5(a)  (sp-graph).  In  these  graphs  there  is 
no  sC-tree  but  the  effect  of  X  on  V  is  still  non-transportable. 

The  main  technical  subtlety  here  is  that  in  sC'-trees,  a  S- 
node  combines  its  effect  with  a  X-node  intersecting  in  the 
root  node  (considering  only  the  bidirected  edges),  which  is 
not  the  case  for  non-transportability  in  general.  Note  that  in 
the  graphs  in  Figures  3  and  the  sp-graph,  the  nodes  S  and 
X  intersect  first  through  ordinary  edges  and  meet  through 
bidirected  edges  only  on  the  Y  node.  This  implies  a  certain 
“asynchrony”  because  in  the  structural  sense  when  we  have 
a  .S’-node  this  implies  a  difference  in  the  structural  equa¬ 
tions  between  domains.  But  only  a  difference  in  the  struc¬ 
tural  sense  does  not  imply  non-transportability,  for  instance, 
P*  (z)  is  transportable  in  the  sp- graph  even  though  the  equa¬ 
tions  of  Z  being  different  in  both  models. 

The  key  idea  to  produce  a  proof  for  non-transportability  in 
these  cases  is  to  keep  the  effect  of  .S'-nodcs  after  intersecting 
with  X  “dormant”  until  they  reach  the  target  Y  and  then 
manifest.  We  implement  this  idea  in  the  next  proof,  which 
is  one  base  case  and  should  pavement  the  way  for  the  most 
general  problem. 

Theorem  8.  P*(Y)  is  not  transportable  in  the  sp-graph 
(Fig.  5(a)). 


Corollary  2.  Let  G  be  a  selection  diagram,  let  X  and  Y  Pwof  We  wil1  construct  two  causal  models  M,  and  M2 

be  set  of  variables.  If  there  exists  a  node  W  which  is  an  compatible  with  the  sp- graph  that  will  agree  on  (P,  P  ,  J), 

ancestor  of  some  node  Y  G  Y  and  such  that  there  exists  a  but  disagree  on  the  interventional  distribution  PX(Y). 

W -rooted  sC-tree  which  contains  any  variables  in  X,  then  us  assume  that  all  variables  in  U  U  V  are  binary,  and 

Px(y)  is  not  transportable.  '  let  ui  be  the  common  cause  of  X  and  Y,  U2  be  the  com¬ 

mon  cause  of  Z  and  Y,  and  U;>  be  the  random  disturbance 
Proof.  Fix  a  IL-motcd  sC-tree  T,  and  a  path  p  from  W  to  Y.  exclusive  to  X.  Let  M1  and  M2  agree  with  the  following 

Consider  the  graph  pUT.  Note  that  in  this  graph  P*  (Y)  =  definitions: 

EUJ  Px  (w)P*  ( Y\w ).  From  the  last  Theorem  P*  (w)  is  not  r  X  =  U1®U3 

transportable,  it  is  now  easy  to  construct  P*  ( Y  \  W)  in  such  a  Mi ,  M2  =  <  y  —  Z  ®U  ®U--> 

way  that  the  mapping  from  PX(W)  to  Px( Y)  is  one  to  one,  c 

while  making  sure  all  distributions  are  positive.  □  and  disagree  in  respect  to  Z  as  follows: 

f  M1:Z  =  X<8>U2®S 
[  M2:Z  =  ((X  ®  U2)  V  S)  <g>  (S  A  (X  <g>  U2)) 


Remark  2.  The  previous  results  comprised  cases  in  which 
there  exist  sC'-trees  involved  in  the  non-transportability  of  Y 


Both  models  also  agree  in  respect  to  P(U),  which  is  defined 
as  follows:  P(Pi)  ±  1/2,  P(P2)  =  P(P3 )  =  1/2. 

Lemma  8.  The  two  models  agree  in  the  distributions 

(P,P*,I). 

Proof  Subclaim  1:  Let  us  show  that  both  models  agree  in 
the  observational  and  interventional  distributions  relative  to 
domain  II,  i.e.,  the  pair  (P,  I).  The  index  variable  S  is  set  to 
0  in  II,  and  0  evaluates  to  (X  0  If)  in  both  models.  Since 
the  two  models  agree  on  P(U)  and  all  other  other  functions, 
the  two  models  generate  the  same  distributions  for  II. 
Subclaim  2:  Let  us  show  that  both  models  agree  in  the  ob¬ 
servational  distribution  P*  relative  to  II* .  The  index  variable 
S  is  set  1  in  II*,  fz  evaluates  to  ((Pi  0  P2  0  P3)  0  l)  in 
Mi,  and  (Pi  0  P2  0  P3)  in  M2. 

Before  completing  this  proof,  consider  first  the  next  two 
statements. 

Subclaim  3:  Let  X  and  Y  be  two  binary  variables  such  that 
P(X  =  a:)  =  j?  /  1/2  and  P(Y  =  y)  =  q  =  1/2.  Then  the 
probabilistic  input/output  behavior  of  0  =  XOR(X,  Y)  is 
the  same  of  Y.  The  variable  0  =  1  whenever  {(X  =  1,  Y  = 
0),  (X  =  0,  Y  =  1)},  which  happens  with  probability  pq  + 
(1  —  p)(  1  —  q).  Since  q  =  1/2,  the  expression  reduces  to 

p*  1/2+  (1  -p)  *  1/2  =  1/2. 

Subclaim  4:  Let  X  and  Y  be  two  binary  variables  such  that 
P(X  =  x)  =  P(Y  =  y)  =  p  =  1/2.  Then  the  probabilistic 
input/output  behavior  of  W  =  XOR(X,  Y)  is  the  same  of 
X  (or  Y).  This  follows  directly  from  Subclaim  3. 

Now  let  us  consider  again  Subclaim  2.  From  Subclaim  3 
and  4  together  with  the  distribution  P(U),  it  follows  that  fz 
evaluates  in  the  same  way  in  both  models. 

In  turn  consider  the  behavior  of  fy ,  which  evaluates  to  P3 
in  Mi,  and  P3  in  M2.  Since  P(P3)  is  uniformly  distributed, 
the  distribution  of  Y  agrees  in  the  two  models.  □ 

Lemma  9.  There  exist  values  of  X,  Y  such  that 
Pi(Y\do(X),S  =  1)  ^  P2(Y\do(X),S  =  1). 

Proof.  Fix  X  =  1  ,Y  =  1.  First  notice  that  fz  evaluates 
to  U2  in  Mi  and  U2  in  M2.  Given  that  U2  is  distributed 
uniformly,  both  quantities  coincide  (and  they  represent  the 
effect  of  X  on  0,  which  is  transportable  in  G).  Now  the 
evaluation  of  fy  in  M\  reduces  to  Pi,  while  it  reduces  to  Pi 
in  M2.  It  follows  that  in  Mi,  fy  evaluates  to  1  with  proba¬ 
bility  P(Pi  =  1),  while  in  M2  it  evaluates  with  probability 
P(Pi  =  0),  which  disagree  by  construction,  finishing  the 
proof  of  this  Lemma.  □ 

By  Lemma  1,  Lemmas  8  and  9  together  prove  Theorem 
8.  □ 

Remark  3.  We  have  a  different  sort  of  asymmetry  in  the 
case  of  Fig.  5(b)  (called  s  6- graph).  In  this  case,  the  nodes 
X  and  S  do  not  intersect  before  meeting  Y  -  i.e.,  they  have 
disjoint  paths  and  Y  lies  precisely  in  their  intersection. 

Still,  this  case  is  not  the  same  of  having  a  sG-tree  because 
in  s  6- graphs  we  need  to  keep  the  equality  from  the  S  nodes 
to  Y  until  S  intersects  X  on  Y.  Employing  a  similar  con¬ 
struct  as  in  the  sp-graph,  we  keep  the  effect  of  S  dormant 
until  it  reaches  Y  and  then  emerges. 


Theorem  9.  P*(Y)  is  not  transportable  in  the  sb-graph 
(Fig.  5(b)). 

Proof.  We  construct  two  causal  models  Mi  and  M2  com¬ 
patible  with  the  s6-graph  that  will  agree  on  (P,  P* ,  I),  but 
disagree  on  the  interventional  distribution  PX(Y). 

Let  us  assume  that  all  variables  in  U  U  V  are  binary,  and 
let  Pi  be  the  common  cause  of  A'  and  Y ,  U2  be  the  com¬ 
mon  cause  of  Z  and  Y,  and  P3  be  the  random  disturbance 
exclusive  to  X.  Let  Mi  and  M2  agree  with  the  following 
definitions: 

M  M  —  /  x  =  ui  Z  u3 
Mi, M2  -  |  y  =  X000Pi0P2 

and  disagree  in  respect  to  Z  as  follows: 

Mi  :  0  =  P2  0  S 
M2  :  0  =  ((P2  V  S)  0  (5  A  (P/)) 

Both  models  also  agree  in  respect  to  P(U),  which  is  defined 
as  follows:  P(UX)  ±  1/2,  P(P2)  =  P(P3)  =  1/2. 

Lemma  10.  The  two  models  agree  in  the  distributions 

(P,P*,I). 

Proof.  Subclaim  1:  Let  us  show  that  both  models  agree  in 
the  observational  and  interventional  distributions  relative  to 
domain  II,  i.e.,  the  pair  (P,  I).  The  index  variable  S  is  set  to 
0  in  II,  and  0  evaluates  to  (X  0  P2)  in  both  models.  Since 
the  two  models  agree  on  P(U)  and  all  other  other  functions, 
the  two  models  generate  the  same  distributions  for  II. 
Subclaim  2:  Let  us  show  that  both  models  agree  in  the  ob¬ 
servational  distribution  P*  relative  to  II* .  The  index  variable 
S  is  set  1  in  II*,  fz  evaluates  to  ((P2  0  l)  in  Mi,  and  P2 
in  M2.  Given  that  these  variables  are  uniformly  distributed, 
both  models  agree  in  P(Z).  Now  let  us  consider  the  behav¬ 
ior  of  fy,  it  evaluates  to  P3  in  Mi,  and  P3  in  M2,  and  since 
P(P3)  is  uniformly  distributed,  it  is  the  same  in  both  mod¬ 
els.  □ 

Lemma  11.  There  exist  values  of  X,  Y  such  that 
P1(Y\do(X),S  =  1)  ±  P2(Y\do(X),  S  =  1). 

Proof.  Fix  X  =  1,  Y  =  1.  First  notice  that  fz  evaluates  to 
P2  in  Mi  and  P2  in  M2.  The  evaluation  of  fy  in  M\  reduces 
to  Pi,  while  it  reduces  to  U\  in  M2.  It  follows  that  in  Mi, 
fy  evaluates  to  1  with  probability  P(Pi  =  1),  while  in  M2 
it  evaluates  with  probability  P(Pi  =  0),  which  disagree  by 
construction,  finishing  the  proof  of  this  Lemma.  □ 

By  Lemma  1,  Lemmas  10  and  1 1  together  prove  Theorem 
9.  □ 

Remark  4.  We  have  two  complementary  components 
to  forge  a  general  scheme  to  prove  arbitrary  non¬ 

transportability.  First,  the  construct  of  Theorem  4  shows 
how  to  prove  non-transportability  for  general  structures 
such  as  sG-trees.  In  the  sequel,  the  specific  proofs  of  non¬ 
transportability  for  the  sp-graph  (Theorem  8)  and  s  6- graph 
(Theorem  9)  partition  the  possible  interactions  between  X, 
S  and  Y .  In  the  former,  X  and  S  intersect  before  meeting 
with  Y,  while  in  the  latter  they  have  disjoint  paths  and  Y 


lies  in  their  intersection.  Not  surprisingly,  the  proof  for  the 
general  case  basically  combines  these  analyses,  which  we 
show  below. 

Theorem  5.  Assume  there  exist  F,  F'  that  form  a  s-hedge 
for  Px  ( y )  in  II  and  II*.  Then  Px  ( y )  is  not  transportable  from 

n  to  it. 

Proof.  We  first  consider  counterexamples  with  the  induced 
graph  H  =  De(F)c  fl  An(Y)c •  ,  and  assume,  without 
loss  of  generality,  that  H  is  a  forest.  We  use  the  previous 
construction  of  Theorems  8  and  9.  We  construct  two  causal 
models  Mi  and  M2  that  will  agree  on  I  P.  P* .  I),  but  dis¬ 
agree  on  the  interventional  distribution  I  f  (  Y ) . 

Let  F  be  an  R-rooted  s*-tree,  let  V  be  the  set  of  observ¬ 
able  variables  and  U  be  the  set  of  unobservable  variables  in 
F.  Let  us  assume  that  all  variables  in  U  U  V  are  binary.  Call 
the  set  of  the  variables  pointed  by  S'-nodes  W,  which  by  the 
definition  of  s*-tree  is  guaranteed  to  be  non-empty. 

In  both  models,  let  Vi  £  V  \  W  compute  the  bit  parity 
(xor)  of  all  its  parents,  i.e.,  Vi  =  <8(Pa.;  U  Uf),  where  the  xor 
applied  to  a  set  represents  the  operation  applied  recursively 
over  each  element  of  the  set  and  the  result  computed  so  far. 
The  order  of  application  of  the  operator  does  not  affect  the 
final  result  by  Lemma  5. 

Now,  define  If  £  Was  follows: 


Mi  :  W  =  (  ( ®  ( Paw  U  Uw))  ®  5 
M2  :  W  =  ( (  ®  (Paw  U  Uw)  V  S 


S  A  (®{Paw  U  Uw)) 


Let  D  =  F  \  F' .  Note  that  there  exists  Ux  €  U  such  that 
Ux  — >  X  and  Ux  connects  the  nodes  in  D  to  the  nodes  in 
F' .  Without  loss  of  generality,  consider  that  for  each  node 
X  £  X,  there  exists  also  an  exclusive  noise  random  factor 

US  £  U. 

Furthermore,  let  the  distribution  of  Ui  £  U  \  {Ux}  be 
such  that  P(Uf)  =  1/2.  Let  P(UX)  ^1/2  in  both  models 
and  also  P{UX )  ^  P(UX>),  for  all  X,  X'  £  X.  In  words,  the 
exclusive  noise  of  A- -nodes  are  uniformly  distributed,  but 
the  nodes  connecting  X  to  the  ones  in  D  are  not,  and  all  of 
them  have  a  different  parametrization. 

Lemma  12.  The  two  models  agree  in  the  distributions 

(P,P*,I). 

Proof  Subclaim  1:  Let  us  show  that  both  models  agree  in 
the  observational  and  interventional  distributions  relative  to 
domain  II,  i.e.,  the  pair  (P,  I).  The  index  variable  S  is  set 
to  0  in  II,  and  since  S'  =  0  is  cancelled  in  both  expressions, 
W  £  W  evaluates  to  <8 )(Paw  U  Uw)  in  both  models.  Since 
the  two  models  agree  upon  P(U)  and  all  other  functions, 
they  induce  the  same  distributions  for  II. 

Subclaim  2:  Let  us  show  that  both  models  agree  in  the  ob¬ 
servational  distribution  P*  relative  to  II*.  The  index  vari¬ 
able  S  is  set  1  in  II*,  and  for  each  W  £  W,  j\y  evalu¬ 
ates  to  (( Uw  <8  l)  in  Mi,  and  Uw  in  M2.  Given  that  they 
are  uniformly  distributed,  they  induce  the  same  distribution 


over  IV.  For  each  14  descendants  of  W  (including  Y  it¬ 
self),  there  are  two  possible  cases.  First,  it  is  possible  that 
there  exists  U.w  in  the  functional  model  necessary  to  com¬ 
pute  fx,  which  is  distributed  uniformly  and  so  induces  the 
same  distribution  in  both  models  (by  Subclaim  3  of  Lemma 
8).  Alternatively,  there  exists  Uq  £  U  which  is  exclusive 
to  X  and  odd,  therefore  not  being  cancelled  out  and  present 
in  the  final  expression.  Similarly  to  the  previous  case,  the 
same  distribution  is  induced  by  both  functional  models  by 
Subclaim  3  of  Lemma  8.  □ 

Lemma  13.  There  exists  a  value  assignment  x  for  X  such 
that  P1(R|do(X),  5  =  1)  ^  P2(R|do(X),  S  =  1). 

Proof  Fix  X  =  1,Y  =  1.  First  notice  that  fw  evaluates 
to  Uw  in  M2  and  Uw  in  Mi.  The  evaluation  of  fy  in  Mi 


reduces  to  <8  (  (J  Px  >  while  it  reduces  to  <8  (  (J 


in 


M2.  Therefore,  it  follows  that  in  Mi,  fy  evaluates  to  1  with 
probability  fl  P([/2-  =  1),  while  in  M2  it  evaluates  with 
probability  ]/[  P(U\  =  0),  which  disagree,  by  assumption, 
finishing  the  proof  of  the  Lemma.  □ 

Lemma  14.  If  P*(R)  is  not  transportable,  then  neither  is 
P*(Y). 

Proof.  If  R  =  Y,  we  are  done.  Otherwise,  there  exists  a 
set  of  paths  from  R  to  Y  because  by  construction  R  C 
An(Y)c  .  Consider  functional  models  in  which  each  node 
in  these  paths  simply  mirrors  the  value  of  one  of  its  parents. 
So  the  nodes  in  Y  would  have  the  same  distribution  of  the 
ones  in  R,  finishing  the  proof  of  the  Lemma.  □ 

Finally,  Lemma  1  together  with  Lemmas  12,  13  and  14 
prove  Theorem  5.  □ 

Theorem  6  (soundness).  Whenever  sID  returns  an  expres¬ 
sion  for  Px( y),  it  is  correct. 

Proof.  The  correctness  of  the  identifiability  calls  were  al¬ 
ready  established  elsewhere  [Huang  and  Valtorta,  2006; 
Shpitser  and  Pearl,  2006b],  which  are  performed  by  sID  over 
n*  and  called  trivial  transportability. 

It  remains  to  show  the  correctness  of  the  test  in  line  6 
of  sID.  First  note  that,  by  construction,  X  is  always  a  set 
of  pre-treatment  covariates.  But  now  the  correctness  follows 
directly  by  S-admissibility  of  X  together  with  Corollary  1  in 
[PB,  2011],  □ 

Remark  5.  The  next  results  are  similar  to  the  analogous 
ones  for  identification  given  in  [Tian  and  Pearl,  2002]  and 
[Shpitser  and  Pearl,  2006a]. 

Theorem  7.  Assume  sID  fails  to  transport  Px(  y)  (executes 
line  7).  Then  there  exists  X'  C  X,  Y'  C  Y,  such  that  the 
graph  pair  D,  Cq  returned  by  the  fail  condition  ofsID  con¬ 
tain  as  edge  subgraphs  s*  -trees  F,  F'  that  form  a  s-hedge 

for  PA y')- 


Proof.  Before  failure  sID  evaluated  false  consecutively  at 
line  5  and  6,  so  D  local  to  this  call  is  a  sC-component,  and 
let  R  be  its  root  set.  We  can  remove  some  directed  arrows 
from  D  while  preserving  R  as  root,  yielding  a  R-rooted  s*- 
tree  F.  Since  by  construction  F'  ■-  F  fl  Co  is  closed  under 
descendants  and  only  directed  arrows  were  removed,  both 

F, F'  are  s*-trees.  Also  by  construction,  R  C  An(Y )  £>x 
together  with  the  fact  that  X  and  Y  from  the  recursive  call 
are  clearly  subsets  of  the  original  input,  finish  the  proof.  □ 

Lemma  15.  Let  X,  Y  be  sets  of  variables.  If  Q  =  P*(Y) 
is  not  transportable  in  G,  then  Q  is  not  transportable  in  the 
graph  resulted  from  adding  a  directed  or  bidirected  edge  to 

G.  Equivalently,  if  Q  is  transportable  in  G,  then  it  is  also 
transportable  in  graph  resulted  from  removing  a  directed  or 
bidirected  edge  from  G. 

Proof.  This  result  is  obvious,  see  [Tian,  2002,  Chapter  5] 
that  provides  an  analogous  construction.  □ 

Lemma  16.  Let  X,  Y  be  sets  of  variables.  If  Q  =  P*(Y) 
is  not  transportable  in  respect  to  the  selection  diagram  G, 
then  Q  is  not  transportable  in  the  selection  diagram  resulted 
from  adding  selection  nodes  to  G.  Equivalently,  ifQ  is  trans¬ 
portable  in  G,  then  it  is  also  transportable  in  graph  resulted 
from  removing  selection  nodes  from  G. 

Proof.  This  proof  is  also  obvious  and  follows  the  same 
structure  of  Lemmas  4  and  15.  □ 

Corollary  3  (completeness).  sID  is  complete. 

Proof.  The  result  follows  from  Theorem  7  together  with 
Lemmas  4,  15,  and  16.  □ 

Corollary  4.  Theorem  3  in  [PB,  201 1  ]  is  incomplete. 

Proof.  Figure  1(c)  demonstrates  a  selection  diagram  in 
which  the  relation  R  =  P*(y\do(x))  is  transportable,  but 
Theorem  3  is  not  capable  of  recognizing  it. 

Let  us  test  the  applicability  of  each  of  its  conditions: 

Step  1.  R  is  not  trivially  transportable  due  to  the  confound¬ 
ing  arc  X  <->  Z  due  to  Tian’s  identifi ability  [Tian 
and  Pearl,  2002]; 

Step  2.  There  is  no  S-admissible  set  because  the  confound¬ 
ing  arc  V  <->  Y  and  Verma’s  inducing  path  condi¬ 
tion  [Verma  and  Pearl,  1990]; 

Step  3.  There  is  no  set  W  which  makes  ( X  _LL  Y|FF)  to 
hold,  this  is  due  to  the  confounding  arc  X  Y; 

Since  there  is  no  remaining  actions  to  be  taken,  the  algorithm 
exits  without  returning  any  expression.  □ 
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